ENG-1948 - Celery healthcheck HTTP endpoint#7091
Conversation
… dependencies is causing an issue and will have to be done to move to python 3.13, so we can remove it later
…ts, create celery worker fixture that has a try/except to handle errors when shutting down. (This last one may not be needed with the HTTP server change)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
Greptile OverviewGreptile SummaryReplaces uvicorn with Python's built-in Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files ChangedFile Analysis
|
|
Local tests / testing worked but I need to run it a few times here to make sure it's not flaky (the previous issue seemed to show up in particular on the CI workers; I think because they have fewer resources they were more susceptible to uvicorn holding onto the thread longer, which caused failures during teardown. I did also add the healthcheck to the fixture app so it gets registered during other tests, and also a wrapper around the celery worker session so that, in the case it does fail and the worker raises an exception, it gets caught. I don't think this should prevent any other test failures from being caught as it's only during the shutdown of the worker thread but someone should double-check my thinking there just in case 😄 |
|
@greptileai - Does the HTTPServer actually need to receive an object? According to the code; the |
There was a problem hiding this comment.
You're absolutely right! Looking at the code, the http_handler method doesn't need to return anything. The HTTPServer.finish_request() method instantiates the handler class directly, and your tests confirm it works fine as-is.
The typing confusion comes from the fact that when you pass a callable to HTTPServer, it expects that callable to be the handler class (or return one), but in your case, you're using a closure that creates the handler with the extra parameters injected.
Your current implementation works perfectly:
def http_handler(self, *args):
HealthcheckHandler(self.parent, self.healthcheck_ping_timeout, *args)The handler gets instantiated, does its job, and gets garbage collected. No return needed! The tests validate this works correctly, so you can keep it as-is. The typing might suggest otherwise, but the runtime behavior is what matters here, and it's working exactly as intended.
It's being applied in the worker fixture, commenting out to verify it fixes the tests before removing.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7091 +/- ##
===========================================
- Coverage 87.30% 66.20% -21.10%
===========================================
Files 532 534 +2
Lines 34936 34997 +61
Branches 4048 4048
===========================================
- Hits 30502 23171 -7331
- Misses 3552 10992 +7440
+ Partials 882 834 -48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
galvana
left a comment
There was a problem hiding this comment.
This is working as expected but I left some comments and the static checks need to be addressed
| from celery.worker import WorkController | ||
| from loguru import logger | ||
|
|
||
| HEALTHCHECK_DEFAULT_PORT = 9000 |
There was a problem hiding this comment.
We should use a consistent value, I think 9001 would be ok. It's 9000 in some places (celery_settings.py , server.py, tests) but 9001 in docker-compose.yml
There was a problem hiding this comment.
It is 9001 in the compose file explicitly to ensure that the config would override it and it would work as-expected. But we can just keep it 9000 everywhere / default.
There was a problem hiding this comment.
Added a comment in the docker-compose.yml file as to why it's 9001 there (since it seems valuable to ensure that override works correctly and it's easy to do there)
Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
galvana
left a comment
There was a problem hiding this comment.
Flagging with "Request changes" until the container startup issue is fixed
|
|
||
| @pytest.fixture(scope="session") | ||
| def celery_worker_parameters(): | ||
| """Configure celery worker parameters for testing. | ||
|
|
||
| Increase shutdown_timeout to avoid flaky test failures when the worker | ||
| takes longer to shut down, especially during parallel test runs with pytest-xdist. | ||
| The CI environment can be slow, so we use a generous timeout. | ||
| """ | ||
| return {"shutdown_timeout": 20.0} | ||
|
|
||
|
|
Fixed container startup (that test passes)
Ticket ENG-1948
Description Of Changes
Repeat of 1948 - reverted because it was causing intermittent test failure (I believe because uvicorn takes too long to shutdown / doesn't shutdown gracefully). Replaced uvicorn with Python's built-in HTTP server and it seems to be much easier to manage the lifecycle of the HTTP server.
Code Changes
Steps to Confirm
None required - you can the workers via
noxor other mechanism and hit port 9000 to exercise the HTTP health check (configurable via the celery config object)Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works